Introduction to Medical Statistics 2024
Exercise class IX
Sample Size Calculation and Repetition
Exercise class IX
Sample Size Calculation and Repetition
I. Sample size calculation via formulas
Investigators want to run an RCT that compares a new chemotherapy an old one. They use 6-month tumour response as the primary endpoint. They need to perform a sample size/power calculation. The investigators assume that the tumour response probability is 20% for the old chemotherapy (“control” arm) and 40% for the new one (“intervention” arm).
- How many patients are required in order to have a power of 90% to detect the assumed increase in the tumour response probability at the two-sided 5% significance level? Use the R-Function power.prop.test and round the result up to a full number.
- Assume that in reality, the new therapy increases the tumour response probability only to 30%. What power will the RCT have to get a significant result with the sample size calculated in a.? Use power.prop.test again, but now use the argument n=… instead of power=… and set the sample size per group to the value we found in a.
- The new chemotherapy is expected to also cause less anemia. As a secondary endpoint, the haemoglobin value after the first chemotherapy cycle was chosen. It is expected that mean haemoglobin at that time point is 1 g/dl higher for the intervention arm and that the standard deviation of haemoglobin values is 2.5 g/dl in both arms. What power does the RCT with the sample size from a. have to give a significant result at the two-sided 5% significance level for the secondary endpoint under these assumptions? Do you have a wild guess of the power before you perform the calculation? Use power.t.test (again setting sample size per group n=… and not setting a value for the power argument).
II: Sample size calculation via simulation
In the real world we typically have one single data set. However, on our computer we can create a hypothetical situation in which we generate as many artificial data sets as we want. Such a simulation study is often performed by statisticians in order to investigate performance of a newly developed statistical method.
Simulation studies can also be used in sample size/power calculations, because they consider a hypothetical situation, before data are being collected. Instead of using ready-to-use software for power and sample size calculations, we can also use our computer as our laboratory: we generate RCTs on our computer and look at the distribution of the outcomes.
We use the setting of question I.a. We first set some values
- Visualize the probability distribution of the number of observed tumour reponses with a sample of N=109 and a response probability of prob1=0.2. Do the same for the proportion of observed tumour responses (code not given).
- Sample N=109 tumour outcomes under the response probability prob1=0.2 and compute the number of responses. Locate this number in the above figure. Is this number likely to occur? (Note: this is random, so each of you may have a slightly different answer.) What is the proportion of tumour response in your generated data set? We use the rbinom function, generating N binary 0-1 outcomes.
- If we want to get an idea of the distribution of the number of successes, we repeat the experiment many times. We compute the number of tumour responses in each experiment and summarize over the experiments. Note that when generating the data with the rbinom function, we now specify M binomial experiments each of size N=109, with the total number of “successes” per experiment as outcome. Set M=100. Can you explain the difference between the generated data (in orange) and the theoretical distribution (in black)?
- We return to the sample size and power calculations. We simulate the RCT M=1,000,000 times. In each “experiment”, we generate N individuals from the control group and the intervention group, the latter both under \(H_0\) and under the assumed value \(p=0.4\) under \(H_1\). We collect all results in a data.frame. What do you observe?
- Under the null hypothesis, the tumour response probability is 0.2 in both the intervention group tr and the control group pl. Since the number in each group is the same, we can use the difference between the number of individuals with tumour response in both groups as test statistic, \(D=X_{tr}-X_{pl}\). In order to obtain the value of observed responses when we reject \(H_0\), we need to compute the distribution of \(D\) under the null hypothesis. \(D\) doesn’t have any standard distribution that you know of (although it can be approximated by a normal distribution). However, we can use our simulated data! Summarize and plot the distribution of the difference in number of tumour responses under \(H_0\). Under two-sided significance level \(\alpha=0.05\), how large should the difference be in order to decide that \(H_0\) is not correct (this value is called the “critical value”)?
- Compute the power if N=109 and p=0.4, i.e. the percentage of “experiments” that lead to rejection of \(H_0\) if p=0.4.
- We computed the power for sample size per group of N=109. However, we can use the same approach to do a sample size calculation. For this, we vary \(N\) and repeat the calculations as in d./e./f. The most efficient way of doing this is by first creating a function that performs the essential steps. I created this function for you. Run the code in the chunk below in your R session; this will make the function available for your use. Check whether it gives the same value for the power as above if N=109.
Now we can compute the power for different sample sizes. If we have no idea of the required sample size, we can choose a range, say from 50 to 200. In order to save time, we set M smaller at 100,000. (This calculation may take one minute.) What do you conclude with respect to the sample size needed to obtain a power of 90%?
III: Analysis of RCTs/Repetition of statistical tests
A randomized controlled clinical trial compared a new procalcitonin(biomarker)-guided antibiotics administration versus antibiotics administration according to standard of care in patients admitted to hospital with lower respiratory tract infections. A total of 442 patients were randomized. The primary endpoint of the trial was antibiotics prescription within 30 days of hospitalization (i.e. did the patient get any antibiotics during that time interval or not). An important secondary endpoint was the length of hospital stay (in days). The results of the trial are in the dataset pctTrial.csv with a short description in file pctTrial_description.txt.
- Summarize the antibiotics prescription probabilities in the two arms and compare them with an appropriate statistical test. What does the code below do? What do you conclude?
- Summarize the length of hospital stay in the two arms and compare them with an appropriate test.
- Plot the relation between the two biomarkers pct and proADM. Should the biomarkers be transformed prior to the analysis? What is the correlation between the two markers?